44 research outputs found
I2SRM: Intra- and Inter-Sample Relationship Modeling for Multimodal Information Extraction
Multimodal information extraction is attracting research attention nowadays,
which requires aggregating representations from different modalities. In this
paper, we present the Intra- and Inter-Sample Relationship Modeling (I2SRM)
method for this task, which contains two modules. Firstly, the intra-sample
relationship modeling module operates on a single sample and aims to learn
effective representations. Embeddings from textual and visual modalities are
shifted to bridge the modality gap caused by distinct pre-trained language and
image models. Secondly, the inter-sample relationship modeling module considers
relationships among multiple samples and focuses on capturing the interactions.
An AttnMixup strategy is proposed, which not only enables collaboration among
samples but also augments data to improve generalization. We conduct extensive
experiments on the multimodal named entity recognition datasets Twitter-2015
and Twitter-2017, and the multimodal relation extraction dataset MNRE. Our
proposed method I2SRM achieves competitive results, 77.12% F1-score on
Twitter-2015, 88.40% F1-score on Twitter-2017, and 84.12% F1-score on MNRE
Asymmetric Polynomial Loss For Multi-Label Classification
Various tasks are reformulated as multi-label classification problems, in
which the binary cross-entropy (BCE) loss is frequently utilized for optimizing
well-designed models. However, the vanilla BCE loss cannot be tailored for
diverse tasks, resulting in a suboptimal performance for different models.
Besides, the imbalance between redundant negative samples and rare positive
samples could degrade the model performance. In this paper, we propose an
effective Asymmetric Polynomial Loss (APL) to mitigate the above issues.
Specifically, we first perform Taylor expansion on BCE loss. Then we ameliorate
the coefficients of polynomial functions. We further employ the asymmetric
focusing mechanism to decouple the gradient contribution from the negative and
positive samples. Moreover, we validate that the polynomial coefficients can
recalibrate the asymmetric focusing hyperparameters. Experiments on relation
extraction, text classification, and image classification show that our APL
loss can consistently improve performance without extra training burden.Comment: ICASSP 202
Conversion Prediction Using Multi-task Conditional Attention Networks to Support the Creation of Effective Ad Creative
Accurately predicting conversions in advertisements is generally a
challenging task, because such conversions do not occur frequently. In this
paper, we propose a new framework to support creating high-performing ad
creatives, including the accurate prediction of ad creative text conversions
before delivering to the consumer. The proposed framework includes three key
ideas: multi-task learning, conditional attention, and attention highlighting.
Multi-task learning is an idea for improving the prediction accuracy of
conversion, which predicts clicks and conversions simultaneously, to solve the
difficulty of data imbalance. Furthermore, conditional attention focuses
attention of each ad creative with the consideration of its genre and target
gender, thus improving conversion prediction accuracy. Attention highlighting
visualizes important words and/or phrases based on conditional attention. We
evaluated the proposed framework with actual delivery history data (14,000
creatives displayed more than a certain number of times from Gunosy Inc.), and
confirmed that these ideas improve the prediction performance of conversions,
and visualize noteworthy words according to the creatives' attributes.Comment: 9 pages, 6 figures. Accepted at The 25th ACM SIGKDD Conference on
Knowledge Discovery and Data Mining (KDD 2019) as an applied data science
pape